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Abstract. A compressive sensing method combined with decomposition of a 
matrix formed with image frames of a surveillance video into low rank and 
sparse matrices is proposed to segment the background and extract moving 
objects in a surveillance video. The video is acquired by compressive mea- 
surements, and the measurements are used to reconstruct the video by a low 
rank and sparse decomposition of matrix. The low rank component repre- 
sents the background, and the sparse component is used to identify moving 
objects in the surveillance video. The decomposition is performed by an aug- 
mented Lagrangian alternating direction method. Experiments are carried out 
to demonstrate that moving objects can be reliably extracted with a small 
amount of measurements. 

1. Introduction. In a network of cameras for surveillance, a massive number 
of cameras are deployed, some with wireless connections. The cameras transmit 
surveillance videos to a processing center where the videos are processed and ana- 
lyzed. Of particular interest in surveillance video processing is the ability to detect 
anomalies and moving objects in a scene automatically and quickly. 

Detection of moving objects is traditionally achieved by background subtraction 
methods [1, 21] which segment background and moving objects in a sequence of 
surveillance video frames. The mixture of Gaussians [25] technique assumes that 
each pixel has a distribution that is a sum of Gaussians and the background and 
foreground are modeled by the size of the Gaussians. In low rank and sparse de- 
composition [4], the background is modeled by a low rank matrix, and the moving 
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objects are identified by a sparse component. These traditional background subtrac- 
tion techniques require all pixels of a surveillance video to be captured, transmitted 
and analyzed. 

A challenge in the network of cameras is the bandwidth. Since traditional back- 
ground subtraction requires all pixels of video to be acquired, an enormous amount 
of data is transported in the network due to a large number of cameras. At the 
same time, most of the data is uninteresting due to inactivity. There is a high 
risk of the network being overwhelmed by the mostly uninteresting data to prevent 
timely detection of anomalies and moving objects. Therefore, it is highly desirable 
to have a network of cameras in which each camera transmits a small amount of 
data with enough information for reliable detection and tracking of moving objects 
or anomalies. Compressive sensing [5, 11] allows us to achieve this goal. In compres- 
sive sensing, the surveillance cameras make compressive measurements of video and 
transmit measurements in the network. Since the number of measurements is much 
smaller than the total number of pixels, transmission of measurements, instead of 
pixels, helps to prevent network congestion. Furthermore, the lower data rate of 
compressed measurements helps wireless cameras to reduce power consumption. 

When a surveillance video is acquired by compressive measurements, the pixel 
values of the video frames are unknown, and consequently, the traditional back- 
ground subtraction techniques such as [4, 25] cannot be applied directly. A straight 
forward approach is to recover the video from the compressive measurements [16, 
19], and then, after the pixel values are estimated, to apply one of the known back- 
ground subtraction techniques. Such an approach is undesirable for two reasons. 
First, a generic video reconstruction algorithm does not take advantage of special 
characteristics of surveillance video in which a well defined, relatively static back- 
ground exists. The existence of a background provides prior information that helps 
to reduce the number of measurements. Secondly, in the straight forward approach, 
additional processing is needed to perform background subtraction after the video 
is recovered from the measurements. 

In this paper, we propose a method for segmentation of background by using 
a low rank and sparse decomposition of matrix. In this method, the compressive 
measurements from a surveillance camera are used to reconstruct video which is 
assumed to be comprised of a low rank and a sparse component. As in [4], the low 
rank component is the background, and the sparse component identifies moving 
objects. Therefore, the background subtraction becomes part of the reconstruction, 
and no additional processing is needed after reconstruction. Furthermore, the re- 
construction takes advantage of the knowledge that there exits a background in the 
video, which helps to reduce the number of measurements required. 

The proposed method is inspired by the work of [4] and extends it to the measure- 
ment domain, rather than the pixel domain, for use in conjunction with compressive 
sensing. This method is motivated by [13] where a matrix equation is solved with 
the assumption that the solution is a sum of a low rank matrix and a sparse ma- 
trix for 4D-CT reconstruction. Compressive sensing has been used in background 
subtraction previously [6], but the method of [G] requires the pixel values of the 
background to be known a priori, such as acquired from a training process. The 
method of this paper may be considered to be the training process in which the 
compressive measurements are used to obtain the background. 

The paper is organized as follows. In Section 2, the framework for reconstruction 
by low rank and sparse decomposition is introduced. The alternative direction 
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method (ADM) for solving the optimization problem is discussed in Section 3. The 
treatment of color components is discussed in Section 4. Finally, experiments are 
discussed and results are reported in Section 5. 

2. Low rank and sparse decomposition. The framework of our method is 
shown in Figure 1. We first treat the video as black and white, having only the 
luminance component. Color video with R, G, B components will be discussed later. 
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Figure 1. Compressive video sensing framework 



2.1. Video volume. We consider a video sequence consisting of a number of 
frames. Let Xj G M" be a vector formed from pixels of frame j of the video se- 
quence, for j = 1, . . . , J, where J is the total number of frames and n is the total 
number of pixels in a frame. Let X — [xi, . . . , xj] S M"^''^ be the matrix of dimen- 
sion n X J, the columns of which are the frames in the video sequence. 

In general, X — [xi,. . . ,xj] E M"^'^ is a video volume obtained from a video 
sequence, in which each Xj is a vector formed from pixels of a sub-region in frame j of 
the video sequence. The position of the sub-region within each frame is independent 
of j. The index n is the total number of pixels in the sub-region. The total number 
of entries in X is iV = nJ. 

2.2. Compressive measurements. Let be an M x measurement matrix with 
M rows and N columns, where M < N. The measurement matrix (j) may be chosen 
as a random matrix such as a randomly permutated Walsh-Hadamard matrix. Let 
(j) = [01, ... , (j>j] , where € M*^^" is a matrix of dimension M x n 

The compressive measurements of the video volume are defined as 

J 

y (j)o X ^^(l)jXj, (1) 

where y is a vector of length M. The number of measurements, M, is much smaller 
than the total number of pixels of the video volume, N. The rest of the processing 
will only make use of the measurements y, without knowing the original video 
volume X. 

The process of making compressive measurements may be considered to be an 
encoding of the video volume, in which the video volume is encoded by compressive 
measurements. In compressive sensing, the encoding is theoretically a matrix-by- 
vector multiplication. How well a video X can be recovered from compressive mea- 
surements y depends on the sparsity of the X (after transform) and the properties 
of the measurement matrix (j). It is well known that if (j) satisfies the restricted isom- 
etry property (RIP), then the signal X can be recovered from the measurements y 
if the number of measurements M is large enough [5, 11]. Randomly permutated 
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Walsh-Hadamard matrices are shown to have RIP [23] , and such matrices have been 
successfully used as measurement matrices in compressive video sensing [16, 19]. 

Although the measurements are defined by a matrix multiplication, the opera- 
tion of matrix-by-vector multiplication is seldom used in practice, because it has a 
complexity of 0{MN) which may be too expensive for real time applications. When 
a randomly permutated Walsh-Hadamard matrix is used as the sensing matrix, the 
measurements may be computed by using a fast transform which has complexity 
of C'(A^log(A^))[26]. Acquisition of measurements by using other sensing matrices, 
such as a circulant matrix generated by a pseudo-random sequence [17], can also be 
implemented very efficiently in hardware by using shift registers. 

2.3. Reconstruction. Given the measurement vector y, the video volume X can 
be reconstructed by using the following minimization problem: 

X = Xi+X2, (2) 



(Xi,X2)=argmin ^i^\\X^\l ^ ii^WWjX^W^ + ^l3\\W^^X2\\, 

x^,X2 (3) 

s.t. y — (f> o X. 

In (3), \\A\\^, is the nuclear norm of a matrix A £ M"^'' defined by 

min(n, J) 

||A||, ^trace(\/lM) = ^ a„ (4) 

1=1 

where cr^ are the singular values of matrix A. The nuclear norm of A is the /i-norm 
of its singular values. ||A||i is the Zi-norm when A is considered to be a vector, 
i.e., ||A||i = X]r=i kul- /^ij and fj,^ are some nonnegative constants. VF], 

i = 1,2 are transforms that give sparse representations of underlying frames or 
video. The transform used in this paper is the wavelet frame transform constructed 
in [9] which will be described later. 

In (2), Xi and X2 represent two different components of the reconstructed video 
volume. The low rank component Xi is a relatively stationary component, which 
represents the background of the video. For example, if Xi has rank one, then 
Xi = [ciXb, ■ ■ ■ ,cjXb] & K"^"^ , where xt is the vector formed from pixels of the 
background image, and cj are some constants. In other words, Xi is made up of a 
sequence of the still images which are scaled images of the stationary background 
in the video. Matrix X2 of (2) is the sparse component, which represents moving 
objects in the video volume. 

2.4. Sparsifying operators. The background image Xb may be sparse in some 
transformed space, for example, in a wavelet transform space. Similarly, the moving 
objects represented by X2 may have spatial correlations which can be sparsified by 
a transform. 

The operators W], i = 1, 2 in (3) are sparsifying, spatial operators. Because W^, 
i = 1,2 have the same form, for simplicity, we use Ws to denote each of W], i — 1,2. 
For a given matrix A S M"^"^, the operators Ws work on columns of the matrix A. 
Specifically, let 

A^[ai,...,aj], aj eM". (5) 
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Then the spatial operator is defined as 

WsA ^ [l^iai, . . . , Wjaj], W, G j = l,...,J. (6) 

In other words, the spatial operator Ws is defined by J linear operators that can be 
generated by wavelet decomposition algorithm on each image frame as given by [7] . 
It can be represented by matrices W}, j — 1, . . . , J oi dimension n' x n, often n' > n. 
The operator Wj, j = 1, . . . , J may be different for each of Wl, i — 1,2 but they also 
may be the same. Furthermore, the matrices Wj may be identical, i.e., Wj = Wq, 
for all j = 1, ... , J. The matrices Wj are chosen to be the tight frame transform as 
given in [7, 22] . The wavelet frames are used in image restorations, since they give 
sparse approximations for many images. More details on applications of wavelet 
frame for image restorations can be found in [9, 24]. 

3. Minimization Algorithm. The minimization problem (3) is a convex problem, 
so standard convex optimization algorithms such as the interior point methods can 
be used. However, these standard methods are computationally expensive and may 
not have the required sparsity and low rank of the solution when the approximated 
minimizer is derived from them. Instead, as shown in [2], the singular value thresh- 
old method is very efficient in low rank matrix completion and low rank matrix and 
sparse matrix decomposition. We use the idea of singular value threshold based 
first-order method using the Augmented Lagrangian Alternating Direction (ADM) 
for solving this minimization problem. We remark that this method is similar to 
the split Bregman method used in image restorations, see [3, 15] for details. 

We first reformulate the problem (3) into an equivalent problem by introducing 
some splitting variables as follows: 

min ^iljZill* + /i2||Z2||i + /isll^slli, 

s.t. Ai = Zi, W}Xi = Z2, WlX2 - ^3, (7) 
4)o (Ai + As) = y. 

We solve this problem by applying the ADM framework, an iterative procedure 
that minimizes the augmented Lagrangian function in alternating directions and 
updates the Lagrangian multipliers in every iteration. The benefit of the alternating 
minimization approach is that it divides the original problem into some subproblems 
which either have closed form solutions or can be solved efficiently. 
Specifically, the augmented Lagrangian of problem (7) is given by 

£a + M2||^2||i + Msll^slli 

™(Ai,Ai-Zi) + ^||Ai-Zi||| 

- (A2,M^iAi - Z2) + ^\\WlX^ ~ Z2III (8) 

- (A3,M^i^Ai - Za) + ^llW^i^Ai - Z^\\\ 

- (A4,0o (Ai + A2) -y) + ^||(/.o (Ai + A2) - y\\%, 

where A^ (i — 1, ... ,4) are Lagrangian multipliers, and /S^ > (i — 1, ... ,4) are 
penalty parameters. 

In each iteration, the augmented Lagrangian is minimized over A- and Z- di- 
rections alternately and the Lagrangian multipliers are updated by the following 
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simple scheme: 

Ai^Ai-7/3i(Xi-Zi) 

\2^\2-lP2{WlXi-Z2) 

A4 ^ A4 - 7/34[0 o {Xi + X2) - y] 

where 7 > is a step-length. 

Clearly, the (Xi, X2)-subproblem, i.e., to minimize (8) over {Xi,X2) is a con- 
vex quadratic problem, which reduces to solving a linear system "S/ (^Xi.X2)^A = 0. 
When the sparsifying transforms Wl form tight frames, i.e., Wj^Wl — I {i — 1,2) 
and the rows of the measurement matrix are orthonormal, i.e., (jxj)^ — I, the lin- 
ear system can be solved by Schur complement and Sherman-Morrison- Woodbury 
formula, in which the major computations are only matrix-vector multiplications 
without inversion of a linear system. In other cases, solving a linear system may 
be too expensive for large scale data. However, the linear system can be solved 
approximately, e.g., by just taking a steepest descent step, and empirical evidence 
shows that the convergence of the algorithm can still be well achieved. 

Note the variables Zi, Z2, Zj, are separable in the augmented Lagrangian function 
La- Therefore, minimizing Ha over (Z\, Z2, Z3) boils down to minimizing over each 
Zi (i — 1,2,3) separately, i.e., 

Zi=argmin + - (Xi - Ai/^i) H^,, (10) 

Zi 2 

Z2=argmin ^i2\\Z2\\l + ^\\Z2 - {W} X^ - X2/ (11) 
Z2 ^ 

Z3=argmin y.^\\Zz\\^ + ^\\Zz - {W^^ X2 - \z/ fh) \\l (12) 
Z3 ^ 

All of them are known to have closed form solutions. The subproblem (10) can be 
solved by so-called singular value thresholding (SVT), i.e., 

Zi-Z?^^/^^(Xi-Ai//3i), (13) 

where Dt{ ) denotes the SVT operator as follows: 

Dr{X) = U ■ diag(max(cr - r, 0)) • V'^ , (14) 

and X = U ■ diag(cr) • V"^ is the singular value decomposition (SVD) of the input 
matrix X. The subproblems (11) and (12) can be solved by the so-called shrinkage 
formula. Let 

TT-(a;) = sgn(a::) • max(|x| — T, 0), (15) 
denote the shrinkage operator, then the solutions to (11) and (12) are given by 

Z2^T^2/P2{WlX,~\2lp2), (16) 
^3-T^3/^3(W^'^2-A3//?3). (17) 



SURVEILLANCE VIDEO PROCESSING 



The iterative scheme of the algorithm is summarized as below. 
Algorithm 1: ADM for Low-rank and Sparse Decomposition 

1 Initialize Ai, A2, Zi, Z2, Z^, Ai, /Si (« = 1, . . . , 4) and 7; 

2 while stopping criterion is not met do 
compute (Ai, A2) from V Xi,X2^A — 0; 

^2-r^,/ft(M^iAi-A2//32); 

Z3^T^,/0,{W!X2-X3/k); 

update Ai (i = 1, . . . , 4) by (9); 



Following from existing ADM theory, the algorithm has global convergence if 
A; > 0(i = 1,.. .,4) and < 7 < (\/5 + l)/2, see [14]. 



4. Color components. So far, we have only considered the luminance component 
of a video volume, treated as a single matrix or vector. A color video has multiple 
color components, such as RGB components, corresponding to multiple matrices 
or vectors. Although each color component can be dealt with individually using 
our previous model, this approach certainly does not exploit the high correlations 
between different color components. Therefore, we want to take advantage of these 
correlations between color components and develop a joint reconstruction procedure. 
Compressive sensing using joint sparsity has previously been considered in, such as, 
[8, 12]. 



4.1. Correlations. Let matrices A^^', Af\ Af^ e R"'^-\ i ^ 1,2 denote the R, 
G, B components of low rank (i = 1) and sparse (i = 2) component, respectively. 
We then define a joint matrix of the colored video for each of low rank and sparse 
component by 

A, = [Af Af )^,Af)^]^ e M^-J ^ 1^2. (18) 

The correlations between color components can be considered in the following 
aspects. First, the linear dependency of background is often similar for different 
color components. As we mentioned, each x[^^ (i = 1,2,3) tends to be low rank, 
and their columns are linearly dependent in a similar way for different colors. That 
means, by stacking them into a big matrix, the rank of Ai will remain almost as 
low as each a|*^ {i — 1,2,3). Secondly, different color components are likely to have 
similar sparsity structure under some sparsifying basis. For instance, if we apply 
wavelet frame transform to each color component, the large wavelet coefficients 
correspond to those locations with sharp changes of pixel values, i.e., the edges. 
However, note that the edges of an image are usually preserved across different color 
components. Therefore, under wavelet frame transforms, different color components 
become jointly sparse, sharing the same support locations. We therefore define the 
sparsifying operators on the joint components by 

W'X, = [(M^lAf y , {W^xPf, (M^lAf )^]^ e E3"'x-^ , « = 1, 2, (19) 



where n' and W*, i — 1,2 are defined in (6). 
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4.2. Joint reconstruction. We extend our model (3) to deal with the joint re- 
construction of multiple color components as below: 

min Mlll^lll* +/"2||W^^^l||2.1 +/^3||W^^^2||2,l, 

X^,X^ (20) 

s.t. (Xi +X2) = y. 

The mixed £2,i-iiorm is defined to take into the consideration that now each pixel has 
three color components, or after sparsifying operator, each transformed coefficient 
is a 3- vector having three components (for R,G and B, respectively). In the mixed 
^2.1-norm, therefore, the 2-norm of the 3-vector is computed first, and then 1-norm 
of the transformed coefficients is formed by 

^(X0)(z))2, (21) 

for any X = [X^^)^, X^^)'^, X^^)"^]^. In (2i), N' is total number of entries on each 
of X^i\ j = 1,2,3. Note that N' may be different from N because the sparsifying 
operator may be redundant. Since this ^2,1-norm is known to promote joint sparsity 
in the solution, we use it to encode the feature that color components have the same 
sparsity pattern under certain basis. And the nuclear norm || • ||* is applied to the 
joint matrix X to exploit the correlations of background linear dependency for 
different color components. 



N' 

l^llz.i — 



4.3. Algorithm. The joint reconstruction model (20) can be solved efficiently by 
the ADM approach. Applying similar splitting technique, we transform (20) into 
an equivalent problem: 

min /^i||Zi||* + H2\\Z2\\2,i + A^sl l^al [2,1, 

s.t. Xi = Zi, W^Xi = Z2, W^X2 = Z3, (22) 
(l>o{Xi + X2) - y. 

Following the same procedure, we derive an ADM algorithm as is summarized below. 



Algorithm 2: ADM for Joint Low-rank and Joint Sparse Decomposition 



1 Initiafize Ai, X2, Zi, ^2, ^3, Ai, /3i 

2 while stopping criterion is not met do 
compute (Ai, A2) from ^Xi,X2^a — 0; 
Zi=D^j^^iXi~A,/(3,y, 
Z2 = S^^/p^{W^Xi-A2/f32); 

^3-5^3//33(^'^2-A3//33); 

update Ai (i = 1, . . . , 4): 



, 4) and 7; 



Ai ^ Ai-7/3i(Xi-Zi); 
A2^A2-7^2(W^'Ai-Z2); 
A3^A3-7/?3(W^'A2-Z3); 
A4 ^ A4 - 7/34[<?!' o (Ai + A2) 
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Here, St-{°) represents a pixel-wise shrinkage operator, i.e., 

Z = SriX) ^ z,, = niax(a:,, - r, 0) • Vz, j, (23) 

where Xij denotes the 3- vector from the input X defined by x,^ = {xf^ , xf^ , xf^Y 
and Zij is similarly defined. The same global convergence result follows [14]. 

5. Numerical Experiment. In this section, we present results from four numer- 
ical experiments. 

5.1. Experiment setup. The surveillance video sequences, Browse2, ShopAssis- 
tantlFront, Traffic and DanieLlight, are obtained from databases that are publically 
available on the web [10, 20, 27]. 

For each video sequence, a number of frames, ranging from 100 to 190 frames, are 
selected to form a video volume. A permutated Walsh-Hadamard matrix is used 
to make measurements of the video volume. The number of measurements used 
in reconstruction of low rank and sparse decomposition is expressed in percentage 
of the total number of pixels in the video volume. For example, 100% means the 
number of measurements is equal to the total number of pixels in the video volume. 

In the reconstruction, the parameters /ii, /i2, Ms ^'^^ fixed for all four experiments, 
and they are given by 

Ml - 1, /i2 - 0, /i3 - le - 3. (24) 
The parameter ^2 in (3) and (20) controls the amount of constraint imposed 
on the sparsity of the low rank component Xi. For the method of this paper, the 
constraint of low rank on Xi is sufficient to produce a high quality reconstruction of 
the low rank component with a relatively small amount of measurements. The low 
rank component is common in most of the frames, and even though the percentage 
of measurement is small, there is a large amount of information about the low 
rank component Xi in the measurements if the video volume has a large number 
of frames. For this reason, the parameter M2 does not play an important role, 
and therefore, it is set to zero in the experiments of this paper. However, we 
introduce /i2 in (3) and (20) for a general framework, which can be also used in 
an adaptive method for real-time processing, see [18]. In an adaptive method for 
real-time processing, the constraint of low rank alone is not sufficient to produce a 
high quality low rank component Xi because the number of frames in the video is 
small. Therefore, the additional constraint of sparsity on Xi becomes important. 
The effect of /i2 is discussed in detail in [18]. 

For each experiment, we report the PSNR of the reconstructed video, X = 
Xi + X2. The experiments are summarized in the following table. 



Table 1. Summary of experiments 



Name 


Browse 


Shop 


Traffic 


Daniel 


Resolution 


384 X 288 


384 X 258 


378 X 282 


320 X 240 


Frames 


100 


120 


190 


130 


Measurements (%) 


4 


4 


6.67 


10 


PSNR (dB) 


32.1 


36.3 


36.5 


30.4 


Rank of Xi 


1 


1 


1 


3 
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To demonstrate the capability of detecting moving objects, we will display the 
images of the background Xi, and the silhouette of the moving objects obtained 
from the sparse component X2. The silhouette of the n-th frame, Sn, is a binary 
image obtained from X2 by the following equation. 



Sn=Ts(Med(X2{n)))). (25) 

In (25), X2{n) is frame n of the sparse component Med{-) a median filter, 
and Ts{ ) is a threshold operator defined as 



Ts{X){i,j) 

10, ii\X{i,])\<5 



(26) 



5.2. Browse2. Browse2 [27] is a color sequence from a camera monitoring a build- 
ing lobby. The original is an MPEG file of resolution 384 x 288 and more than 6 
minutes in length. We take 100 frames from the sequence, and process only the lu- 
minance component. The total number of pixels is = 384 x 288 x 100 11059200. 
The total number of measurements used in the reconstruction is 1/25 (4%) of the 
total number of pixels, i.e., the total number of measurements is M = 442368. 

A typical frame. Frame 18 is shown in Figure 2. The frame from the original is 
shown in the center (b), and the reconstructed background (the low rank compo- 
nent) is shown on the left, and the silhouette of the reconstructed moving objects 
(the sparse component) is shown in the right. 




Figure 2. Frame 18 of Browse2 sequence. Total number of 
measurements is 4% of the total number of pixels, (a) Recon- 
structed background, (b) Original frame, (c) Silhouette of the 
reconstructed moving objects 



5.3. ShopAssistantlFront. ShopAssistantlFront [27] is a color sequence from a 
camera in a shopping mall. The original is an MPEG file of resolution 384 x 258 
and about 1 minute in length. We take 120 frames from the sequence, and process 
only the luminance component. The total number of measurements used in the 
reconstruction is 4% of the total number of the pixels. Frame 115 is shown in 
Figure 3. 

It is worthwhile to note that even with only 4% measurements, we are able to 
extract the people moving inside the shops both above and below the main floor. 
This is shown in the small white dot in the upper middle region of Figure 3 (c), 
which represents shoppers in the shop above the floor behind the shelf. 
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(a) (b) (c) 

Figure 3. Frame 115 of ShopAssistantlfront sequence. To- 
tal number of measurements is 4% of the total number of pixels, 
(a) Reconstructed background, (b) Original frame, (c) Silhouette 
of the reconstructed moving objects. 

5.4. Traffic. Traffic [20] is a black and while sequence from a traffic camera in a 
highway intersection. The original is a sequence of 190 JPEG frames of resolution 
378x282. The total number of measurements used in the reconstruction is 6.67% 
(1/15) of the total number of pixels. Frame 155 is shown in Figure 4. 




(a) (b) (c) 

Figure 4. Frame 155 of Traffic sequence. Total number of 
measurements is 6.67% of the total number of pixels, (a) Recon- 
structed background, (b) Original frame, (c) Silhouette of the 
reconstructed moving objects 

As can be seen from Figure 4, with 6.67% measurements, all moving vehicles are 
removed from the background (a). All vehicles, except one, are detected in (c). The 
undetected car is in the lane going north (up), close to the intersection in the upper 
middle region of Figure 4 (b). The color (intensity) of the car is very close to that of 
the road, and the car is indistinguishable from the noise in the reconstructed sparse 
component. 

5.5. DanieLlight. DanieLlight [10] is a color sequence from a camera monitoring 
an office. The original is a WMV file of resolution 320x240 and about 30 seconds in 
length. We take 130 frames from the sequence and process the full color with joint 
color components. Within the sequence, Daniel walks into the office while the light 
is on, turns of the light and walks out. Therefore, there is an illumination change 
in the sequence. Frames 22 and 102 are shown in Figure 5. 

Frame 22, the top row of Figure 5, shows Daniel walks in while the light is 
on. In Frame 102, the bottom row of Figure 5, Daniel walks out after turning off 
the light. The background Figure 5 (a) is well captured when light is either on 
or off. Note that the illumination change in this sequence is not a simple scaling 
of the background: only the light in front is turned off. The significance of this 
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(a) (b) (c) 

Figure 5. . Frames 22 and 102 of DanieLlight sequence. 

Total number of measurements is 10% of the total number of pixels. 
Top: frame 22. Bottom: frame 102. (a) Reconstructed background, 
(b) Original frame, (c) Silhouette of the reconstructed moving 
objects 

experiment is that in our method, the change in the illumination is not detected as 
part of moving objects. 

6. Conclusion. Low rank and sparse decomposition is an effective method for 
processing surveillance video when it is combined with compressive sensing. This 
method is a good reconstruction method of the surveillance video because it takes 
advantage of the well defined low rank and sparse components in the surveillance 
video signal. The background subtraction and moving object extraction come from 
the process of reconstruction at no additional cost. We have demonstrated by 
experiments that moving objects can be reliably extracted by using a small amount 
of measurements. 

The method proposed in this paper is an "offline" method, meaning that the 
processing is done after a large number of frames are acquired (using compressive 
measurements), and therefore, it is not done in real time. It is possible to extend 
the concept of this paper to "online" , real time processing by adaptively update the 
low rank component. This will be investigated in details in a future paper. 
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